Judy array

In computer science and software engineering, a Judy array is a data structure that has high performance, low memory usage and implements an associative array. Unlike normal arrays, Judy arrays may be sparse, that is, they may have large ranges of unassigned indices. They can be used for storing and looking up values using integer or string keys. The key benefits of using Judy is its scalability, high performance, memory efficiency and ease of use.[1]

Judy arrays are both speed- and memory-efficient, with no tuning or configuration required and therefore they can replace common data structures (skiplists, linked lists, binary, ternary, b-trees, hashing) and work better with very large data sets.

Roughly speaking, it is similar to a highly-optimised 256-ary trie data structure.[2] To make memory consumption small, Judy arrays use over 20 different compression techniques to compress trie nodes.

The Judy array was invented by Douglas Baskins and named after his sister.[3]

Contents

Terminology

Expanse, population and density are commonly used when it comes to Judy. As they are not commonly used in tree search literature, it is important to define them--

  1. Expanse is a range of possible keys. ex: 200, 300, etc
  2. Population is the count of keys contained in an expanse. ex: 200, 360, 400, 512, 720 = 5
  3. Density is used to describe the sparseness of an expanse of keys--> Density = Population / Expanse

Benefits

Memory allocation

Judy arrays are designed to be unbounded arrays and therefore their sizes are not pre-allocated. They can dynamically choose to grow or shrink the memory used according to the population of the array and can scale to a large number of elements. Since it allocates memory dynamically as it grows, it is only bounded by machine memory. [4]The memory used by Judy is nearly proportional to the number of elements (population) in the Judy array.

Speed

Judy arrays are designed to keep the number of processor cache-line fills as low as possible, and the algorithm is internally complex in an attempt to satisfy this goal as often as possible. Due to these cache optimizations, Judy arrays are fast, sometimes even faster than a hash table, especially for very big datasets. Despite Judy arrays being a type of trie, they consume much less memory than hash tables. Also because a Judy array is a trie, it is possible to do an ordered sequential traversal of keys, which is not possible in hash tables.

References

  1. ^ http://packages.debian.org/lenny/libjudy-dev
  2. ^ Alan Silverstein, "Judy IV Shop Manual", 2002
  3. ^ http://judy.sourceforge.net/
  4. ^ Advances in databases: concepts, systems and applications : By Kotagiri Ramamohanarao

External links